12 research outputs found

    Hybrid Metaheuristic Methods for Ensemble Classification in Non-stationary Data Streams

    Get PDF
    The extensive growth of digital technologies has led to new challenges in terms of processing and distilling insights from data that generated continuously in real-time. To address this challenge, several data stream mining techniques, where each instance of data is typically processed once on its arrival (i.e. online), have been proposed. However, such techniques of-ten perform poorly over non-stationary data streams, where the distribution of data evolves over time in unforeseen ways. To ensure the predictive ability of a computational model working with evolving data, appropriate data-stream mining techniques capable of adapting to different types of concept drifts are required. So far, ensemble-based learning methods are among the most popular techniques employed for performing data stream classification tasks in the presence of concept drifts. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner. This thesis aims to propose and investigate novel hybrid metaheuristic methods for per-forming classification tasks in non-stationary environments. In particular, the thesis offers the following three main contributions. First, it presents the Evolutionary Adaptation to Concept Drifts (EACD) method that uses two evolutionary algorithms, namely, Replicator Dynamics (RD) and Genetic algorithm (GA). According to this method, an ensemble of different classification types is created based on various feature sets (called subspaces) randomly drawn from the target data stream. These subspaces are allowed to grow or shrink based on their performance using RD, while their combinations are optimised using GA. As the second contribution, this thesis proposes the REplicator Dynamics & GENEtic (RED-GENE)algorithm. RED-GENE builds upon the EACD method and employs the same approach to creating different classification types and GA optimisation technique. At the same time, RED-GENE improves the EACD method by proposing three different modified versions of RD to accelerate the concept drift adaptation process. The third contribution of the thesis is the REplicator Dynamics & Particle Swarm Optimisation (RED-PSO) algorithm that is based on a three-layer architecture to produce classification types of different sizes. The selected feature combinations in all classification types are optimised using a non-canonical version of the Particle Swarm Optimisation (PSO) technique for each layer individually. An extensive set of experiments using both synthetic and real-world data streams proves the effectiveness of the three proposed methods along with their statistical significance to the state-of-the-art algorithms. The proposed methods in this dissertation are consequently compared with each other that proves each of the proposed methods has its strengths to-wards concept drift adaptation in non-stationary data stream classification. This has led us to formulate a list of suggestions on when to use each of the proposed methods with regards to different applications and environments

    Ensemble Dynamics in Non-stationary Data Stream Classification

    Get PDF
    Data stream classification is the process of learning supervised models from continuous labelled examples in the form of an infinite stream that, in most cases, can be read only once by the data mining algorithm. One of the most challenging problems in this process is how to learn such models in non-stationary environments, where the data/class distribution evolves over time. This phenomenon is called concept drift. Ensemble learning techniques have been proven effective adapting to concept drifts. Ensemble learning is the process of learning a number of classifiers, and combining them to predict incoming data using a combination rule. These techniques should incrementally process and learn from existing data in a limited memory and time to predict incoming instances and also to cope with different types of concept drifts including incremental, gradual, abrupt or recurring. A sheer number of applications can benefit from data stream classification from non-stationary data, including weather forecasting, stock market analysis, spam filtering systems, credit card fraud detection, traffic monitoring, sensor data analysis in Internet of Things (IoT) networks, to mention a few. Since each application has its own characteristics and conditions, it is difficult to introduce a single approach that would be suitable for all problem domains. This chapter studies ensembles’ dynamic behaviour of existing ensemble methods (e.g. addition, removal and update of classifiers) in non-stationary data stream classification. It proposes a new, compact, yet informative formalisation of state-of-the-art methods. The chapter also presents results of our experiments comparing a diverse selection of best performing algorithms when applied to several benchmark data sets with different types of concept drifts from different problem domains

    AdaDeepStream: streaming adaptation to concept evolution in deep neural networks

    Get PDF
    Typically, Deep Neural Networks (DNNs) are not responsive to changing data. Novel classes will be incorrectly labelled as a class on which the network was previously trained to recognise. Ideally, a DNN would be able to detect changing data and adapt rapidly with minimal true-labelled samples and without catastrophically forgetting previous classes. In the Online Class Incremental (OCI) field, research focuses on remembering all previously known classes. However, real-world systems are dynamic, and it is not essential to recall all classes forever. The Concept Evolution field studies the emergence of novel classes within a data stream. This paper aims to bring together these fields by analysing OCI Convolutional Neural Network (CNN) adaptation systems in a concept evolution setting by applying novel classes in patterns. Our system, termed AdaDeepStream, offers a dynamic concept evolution detection and CNN adaptation system using minimal true-labelled samples. We apply activations from within the CNN to fast streaming machine learning techniques. We compare two activation reduction techniques. We conduct a comprehensive experimental study and compare our novel adaptation method with four other state-of-the-art CNN adaptation methods. Our entire system is also compared to two other novel class detection and CNN adaptation methods. The results of the experiments are analysed based on accuracy, speed of inference and speed of adaptation. On accuracy, AdaDeepStream outperforms the next best adaptation method by 27% and the next best combined novel class detection/CNN adaptation method by 24%. On speed, AdaDeepStream is among the fastest to process instances and adapt

    A review of natural language processing in contact centre automation

    Get PDF
    Contact centres have been highly valued by organizations for a long time. However, the COVID-19 pandemic has highlighted their critical importance in ensuring business continuity, economic activity, and quality customer support. The pandemic has led to an increase in customer inquiries related to payment extensions, cancellations, and stock inquiries, each with varying degrees of urgency. To address this challenge, organizations have taken the opportunity to re-evaluate the function of contact centres and explore innovative solutions. Next-generation platforms that incorporate machine learning techniques and natural language processing, such as self-service voice portals and chatbots, are being implemented to enhance customer service. These platforms offer robust features that equip customer agents with the necessary tools to provide exceptional customer support. Through an extensive review of existing literature, this paper aims to uncover research gaps and explore the advantages of transitioning to a contact centre that utilizes natural language solutions as the norm. Additionally, we will examine the major challenges faced by contact centre organizations and offer reco

    RED-GENE: An Evolutionary Game Theoretic Approach to Adaptive Data Stream Classification

    Get PDF
    The extensive growth of digital technologies such as the Internet of Things (IoT), social media networks and forecasting systems has led to new challenges regarding computational complexity and big data mining. The classification task in such applications is not trivial due to the high volume of related data and limited time available for the task. It is particularly difficult when dealing with data streams, where each instance of data is typically processed once on its arrival (i.e. online) while the underlying data distribution often changes due to the changing environment. In this paper, we propose a novel ensemble-based framework called Replicator Dynamics & Genetic Algorithms Approach (RED-GENE) for effective data stream classification in the context of changing environment leading to concept drifts (i.e. evolution of data streams). RED-GENE employs three novel Replicator Dynamics (RD) strategies along with a Genetic Algorithm (GA) optimisation technique to flexibly adapt to different types of concept drifts when performing data stream classification tasks. The proposed framework works as follows. First, a set of random feature combinations is drawn from a given pool of features of the target data stream to create different classification types. Next, RD is used to allow the classification types achieving higher classification accuracy to grow and those with lower accuracy to shrink. A modified version of the classic GA is then employed to optimise the randomly drawn combinations of features in each classification type. The proposed framework was tested using nine data streams (including both real-world and synthetic datasets) to investigate different variations of the proposed framework and compare its performance to other state-of-the-art algorithms using immediate and delayed prequential evaluation methods. The results demonstrated that the proposed framework can provide the best accuracy on average when comparing to five other state-of-the-art algorithms

    EACD: evolutionary adaptation to concept drifts in data streams

    Get PDF
    This paper presents a novel ensemble learning method based on evolutionary algorithms to cope with different types of concept drifts in non-stationary data stream classification tasks. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner, especially in non-stationary environments, where data evolve over time. The evolution of data streams can be viewed as a problem of changing environment, and evolutionary algorithms offer a natural solution to this problem. The method proposed in this paper uses random subspaces of features from a pool of features to create different classification types in the ensemble. Each such type consists of a limited number of classifiers (decision trees) that have been built at different times over the data stream. An evolutionary algorithm (replicator dynamics) is used to adapt to different concept drifts; it allows the types with a higher performance to increase and those with a lower performance to decrease in size. Genetic algorithm is then applied to build a two-layer architecture based on the proposed technique to dynamically optimise the combination of features in each type to achieve a better adaptation to new concepts. The proposed method, called EACD, offers both implicit and explicit mechanisms to deal with concept drifts. A set of experiments employing four artificial and five real-world data streams is conducted to compare its performance with that of the state-of-the-art algorithms using the immediate and delayed prequential evaluation methods. The results demonstrate favourable performance of the proposed EACD method in different environments

    A Non-Canonical Hybrid Metaheuristic Approach to Adaptive Data Stream Classification

    Get PDF
    Data stream classification techniques have been playing an important role in big data analytics recently due to their diverse applications (e.g. fraud and intrusion detection, forecasting and healthcare monitoring systems) and the growing number of real-world data stream generators (e.g. IoT devices and sensors, websites and social network feeds). Streaming data is often prone to evolution over time. In this context, the main challenge for computational models is to adapt to changes, known as concept drifts, using data mining and optimisation techniques. We present a novel ensemble technique called RED-PSO that seamlessly adapts to different concept drifts in non-stationary data stream classification tasks. RED-PSO is based on a three-layer architecture to produce classification types of different size, each created by randomly selecting a certain percentage of features from a pool of features of the target data stream. An evolutionary algorithm, namely, Replicator Dynamics (RD), is used to seamlessly adapt to different concept drifts; it allows good performing types to grow and poor performing ones to shrink in size. In addition, the selected feature combinations in all classification types are optimised using a non-canonical version of the Particle Swarm Optimisation (PSO) technique for each layer individually. PSO allows the types in each layer to go towards local (within the same type) and global (in all types) optimums with a specified velocity. A set of experiments are conducted to compare the performance of the proposed method to state-of-the-art algorithms using real-world and synthetic data streams in immediate and delayed prequential evaluation settings. The results show a favourable performance of our method in different environments

    An Ensemble-Learning-Based Technique for Bimodal Sentiment Analysis

    No full text
    Human communication is predominantly expressed through speech and writing, which are powerful mediums for conveying thoughts and opinions. Researchers have been studying the analysis of human sentiments for a long time, including the emerging area of bimodal sentiment analysis in natural language processing (NLP). Bimodal sentiment analysis has gained attention in various areas such as social opinion mining, healthcare, banking, and more. However, there is a limited amount of research on bimodal conversational sentiment analysis, which is challenging due to the complex nature of how humans express sentiment cues across different modalities. To address this gap in research, a comparison of multiple data modality models has been conducted on the widely used MELD dataset, which serves as a benchmark for sentiment analysis in the research community. The results show the effectiveness of combining acoustic and linguistic representations using a proposed neural-network-based ensemble learning technique over six transformer and deep-learning-based models, achieving state-of-the-art accuracy

    A Review of Natural Language Processing in Contact Centre Automation

    Get PDF
    Contact centres have been highly valued by organisations for a long time. However, the COVID-19 pandemic has highlighted their critical importance in ensuring business continuity, economic activity, and quality customer support. The pandemic has led to an increase in customer inquiries related to payment extensions, cancellations, and stock inquiries, each with varying degrees of urgency. To address this challenge, organisations have taken the opportunity to re-evaluate the function of contact centres and explore innovative solutions. Next-generation platforms that incorporate machine learning techniques and natural language processing, such as self-service voice portals and chatbots, are being implemented to enhance customer service. These platforms offer robust features that equip customer agents with the necessary tools to provide exceptional customer support. Through an extensive review of existing literature, this paper aims to uncover research gaps and explore the advantages of transitioning to a contact centre that utilizes natural language solutions as the norm. Additionally, we will examine the major challenges faced by contact centre organizations and offer recommendations for overcoming them, ultimately expediting the pace of contact centre automation
    corecore